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Abstract 

We produce the first example of bounding total variation distance to stationarity 
and estimating mixing times via orthogonal polynomials diagonalization of discrete 
reversible Markov chains, the Karlin-McGregor approach. 

1 Introduction 

If P is a reversible Markov chain over a sample space Q, and tt is a reversibility function 
(not necessarily a probability distribution), then P is a self-adjoint operator in £ 2 (tt), 
the space generated by the inner product 

< f,9 >tt= f( x )9(x)7r{x) 

induced by tt. If P is tridiagonal operator (i.e. a nearest-neighbor random walk) on f2 = 
{0, 1,2,...}, then it must have a simple spectrum, and is diagonalizable via orthogonal 
polynomials as it was studied in the 50's and 60's by Karlin and McGregor, see [2], 
[8]. There the extended eigenfuctions Qj(X) (Qo = 1) are orthogonal polynomials with 
respect to a probability measure if; and 

Pt(i,j) = J X t Q i (X)Q j (X)dt/j(X) Vi, j G n, 

where TTj (ttq = 1) is the reversibility measure of P. 

In this paper we are testing a possibility of calculating mixing rates using Karlin- 
McGregor diagonalization with orthogonal polynomials. In order to measure the rate 
of convergence to a stationary distribution, the following distance is used. 

Definition 1. If fi and v are two probability distributions over a sample space f2, then 
the total variation distance is 

\\v - h\\tv = o y2 WW - = su p \ u ( a ) - m(^)i 

Observe that the total variation distance measures the coincidence between the distri- 
butions on a scale from zero to one. 
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If p = Y^kLo^k < °o, then v = ^tt is the stationary probability distribution. If in 
addition, the aperiodic nearest neighbor Markov chain originates at site i, then the 
total variation distance between the distribution m = HqP and v is given by 

/ X'Q^Q^d^X) , 
J (-1,1) 

as measure tj) contains a point mass of weight ^ at 1, see [3J. 

The rates of convergence are quantified via mixing times. In the case of a Markov 
chain over an infinite state space with a unique stationary distribution, the notion of 
a mixing time depends on the state of origination of the chain. 

Definition 2. Suppose P is a Markov chain with a stationary probability distribution 
v that commences at Xq = i. Given an e > 0, the mixing time t m i x {e) is defined as 

tmix{e) = min{t : \\v - Im\\tv < e} 



Wtv 



It. 



In the case of a nearest-neighbor process on = {0, 1,2,...} commencing at i, the 
corresponding mixing time has the following simple expression in orthogonal polyno- 
mials 



mm 



(-i.i) 



A*Q<(A)Q i (A)#(A) 



< 2e 



Observe that the above expression is simplified when i = 0. Here we concentrate on 
calculating mixing times for simple positive recurrent nearest-neighbor Markov chains 
over 0, originating from i = 0. Our main result concerns the distance to stationarity 
for a simple random walk with a drift. In the main theorem and its corollary we will 
explore the following Markov chain 
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Theorem 1. Suppose the above Markov chain begins at the origin, i = 0. Consider 
the orthogonal polynomials Q n for the chain. Then the integral ^ X t Q n (X)dip(X) 
can be expressed as 



(l+q-p)(q+r)-q 
(l+q-p)(q+r) 



q 

q+r 



t+n 
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P 
q+r 



2ni J\z\=l 

and the total variation distance \\u — HtWxv * s bounded above by 

t 



(^pq{z+z- 1 )+r) t z n (z-z- 1 ) 



P r+U+q-p} \ f pp r-(l+g-p) 
q 2(q+r) j \ Z \J q 2(q+r) 



dz 



q + r 



+ B{r + 2^p-q)\ 
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0( 

2p) ~~ f, Tf j+P+j 

e J, 0, i/ie mixing time 



^ A = { \tr q "mX) and B = n 7 E^ T7ff %E^ - T/iere/ore ' ta ^ 



where m(p, q) = max (r + 2^/pq), ( 



log(e) 
log m(p, g) 



Observe that in the above complex integral all three finite poles are located inside 
the unit circle. Thus we only need to consider a pole at infinity. 

The proof is provided in section [3l The result in Theorem [T] is the first instance the 
Karlin-McGregor orthogonal polynomials approach is used to estimate mixing rates. As 
it was suggested in [3] we would like the approach to work for a larger class of reversible 
Markov chains over an infinite state space with a unique stationary distribution. There 
is an immediate corollary (see section [3j) : 

Corollary. If ^ > r + 2^pq, 



\W-fi t \\ TV >A\-^-\ -B(r + 2 y /pq) t 

for t large enough, i. e. we have a lower bound of matching order. 

Observe that one can easily adjust these results for any origination site Xq = i. In 
the next section we will compare the above Karlin-McGregor approach to some of the 
classical techniques for estimating the "distance to stationarity" \\u — fJ-tWxv 



2 Comparison to the other techniques 

For the case of geometrically ergodic Markov chains, there are several techniques that 
produce an upper bound on the distance to stationarity that were developed specifically 
for the cases when the sample space is large, but finite. These methods are not directly 
applicable to chains on general state spaces. The coupling method stands out as the 
most universal. Here we compare the geometric rate in Theorem Q] to the one obtained 
via a classical coupling argument. Then we explain why other geometric ergodicity 
methods based on renewal theory will not do better than coupling. See [6] and [3] for 
detailed overview of geometric convergence and coupling. 

2.1 Geometric convergence via coupling 

Consider a coupling process (Xt,Yt), where Xq = as in Theorem [TJ while Yq is 
distributed according to the stationary distribution v = ~ir. A classical Markovian 
coupling construction allows Xt and Yt evolve independently until the coupling time 
Tcoupling = min{t : X t = Y t }. It is natural to compare P(j couv[ing > t) to P(r > t), 
where r = min{£ : Yt = 0} is a hitting time, as the chain is positive recurrent. 
Now, simple combinatorics implies, for k > n, 

P iT = k \Y = n)= -u-m M-t ^ + V 

. . f-f »!(» +n)\j\ 

2i+]=k—n 
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Therefore 

1 E ( E H 

k: k>t \i,j,n: 2i+j=k~n 



p{r>t)<— y y „., p i+n gv 



where '<' appears because itq = 1 < •=, but it does not change the asymptotic rate 
of convergence, i.e. we could write instead of '<'. The right hand side can be 
rewritten as 

k 



r u. „• — n \ J / 



P P k: k>t j=0 



for ^(m) = P(E > m/2), where E is a binomial random variable with parameters 
-f-V Now, by Cramer's theorem, £(m) ~ e [log2+ 5 1o sp+5 i°g(i-p)l™ and there . 



fore 



P(r > t) ~ - J] (r + 2^) fc = " + 2V ^. 2 (r + 2^)* (1) 

Recall that in the Corollary, if q is sufficiently larger than r and p, then (j^^j domi- 
nates (r + 2-y/pg)', and the total variation distance 



t 

t 



\\v-K\\ TV = A\-±-\ ±£(r + 2VM)\ 

where A and are given in Theorem [T] of this paper. Thus we need to explain why, 
when q is sufficiently large, in the equation (pQ), we fail to notice the dominating term 

of j ■ In order to understand why, observe that the second largest eigenvalue 

( — originates from the difference between T couv n ng and r. In fact, E can reach 

state zero without ever sharing a site with Xt (they will cross each other, of course). 
Consider the case when p is either zero, or close to zero. There, the problem reduces to 
essentially coupling a two state Markov chain with transition probabilities p(0, 1) = 1 
and p(l,0) = -^pp. Thus the coupling time will be expressed via a geometric random 
variable with the failure probability of 

Of course, one could make the Markov chain P "lazier" by increasing r at the 
expense of p and q, while keeping proportion ^ fixed, i.e. we can consider P £ = 
jj^(P + £■!)■ This will minimize the chance of Xt and E missing each other, but this 
also means increasing (r + 2^/pq), and slowing down the rate of convergence in ([T|). 

In order to obtain the correct exponents of convergence, we need to redo the 
coupling rules as follows. We now let the movements of Xt and E be synchronized 
whenever both are not at zero (i.e. {Xt, E} H {0} = 0), while letting Xt and E move 
independently when one of them is at zero, and the other is not. Then at the hitting 
time r, either X t = Y t = and the processes are successfully coupled, or X t = 1 
and E = 0. In the latter case we are back to the geometric variable with the failure 
probability of That is, the only way for X t and E to couple would be if one of 
the two is at state and the other is at state 1. Using the set theory notations, if 
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{X t ,Y t } = {0, 1}, conditioning on {X t +i,Y t+ i} ^ {1,2} would give us 

{X t+l Y t+1 } = With probability Wr' 

^{0,1} with probability 

When > r + 2y/pq, the above modified coupling captures the order (^+p) • The 
coefficient A however is much harder to estimate using the coupling approach, while 
it is immediately provided in Theorem [TJ and its corollary. Take for example P = jt, 
r = yi and q = ^j. There > r + 2^/pq, and according to the Corollary, the lower 

bound of A (^-^f^j — B(r + 2^/pq) 1 and the upper bound of A (^^p J + B(r + 2 y /pq) t 
are of the matching order, and the oreder of convergence is tight 

91 / 9 V 39 / 7 x ' 



\\y -ih\\tv- m ^ jo ) ± ^{n 
2.2 Drift, minorization and geometric ergodicity 

The optimal "energy function" V(x) = (i) converts the geometric drift inequality 



in Meyn and Tweedie [6] Chapter 15 into equality 

E[V(X t+1 ) \X t = x] = {r + 2y/pq)V{x) + - (r + 2^m)) *c(x) 

thus confirming the geometric convergence rate of (r + 2^/pq) t for the tail probability 
P(t~c > t), where C = {0} is the obvious choice for the "small set", and tq is the 
hitting time. Once again all the challenge is at the origin. In fact there is only a trivial 
"minorization condition" when C = {0}. The minorization condition reads 

p(x, A) > eQ(A) Vx G C, icH, 

where, if C = {0}, the only choice for the probability measure Q is Q = 5%, and e = 1. 
With e = 1 the split of the Markov chain is trivial, and as far as the corresponding 
coupling goes, the only issue would be (as we mentioned before) to compute the tail of 
the hitting time min{t : (Xt, Yt) G C x C} when q is large. If C = {0, 1, . . . , k} for some 
k > 0, there is no minorization condition. In the latter case, estimating the hitting 
time min{t : (Xf,Yt) ECx C} is straightforward, but without minorization, this will 
not be enough to estimate the tail for the coupling time. The "splitting technique" 
will not work, rather a coupling approach of the preceding subsection to be pursued. 

The case of recurrent reflecting random walk (the M/M/T queue) had been con- 
sidered as one of the four benchmark examples in the geometric ergodicity theory (see 
[TJ and references therein). There, in the absence of the second largest eigenvalue of 

(^—~P[+p S Ji with r = 0, the rate of (2 y /pq) t was proven to be the optimal (see [5]). The 
methods in the theory of geometric convergence are in the most part based on the 
renewal theory (see jS], [TJ, [7] and references therein), and concentrate more on the 
tail probability for the hitting time tc in the splitting method. As for the Markov 
chain P considered in this paper, in the absence of a useful splitting of it, the approach 
that works is the coupling. While the coupling provides the right exponents, it does 
not necessarily produce the tight coefficients. 



5 



Y.V.Kovchegov 



Orthogonality and probability: mixing times 



3 The proof of Theorem H 

Proof. Since we require r > for aperiodicity, we will need to obtain the spectral 
measure ip via an argument similar to that of Karlin and McGregor in [2], where 
the case of r = was solved. The orthogonal polynomials are obtained via solving 
a simple linear recursion: Qq = 1, Q\ = A, and Q n (X) = ci(A)p™(A) + C2(A)/?2(A), 

where pi(X) = A ^^^p^' ^ an< ^ P%W = A T ^^, p ^ 4W are the roots of the 
characteristic equation for the recursion, and c\ = P2 ~ X and C2 = ~ pi ■ 

Now 7i"o = 1, 7r n = ^-n— (n > 1) and p = . Also, we observe that 

p 2 (A)|> A /| on[-l,r-2^g), 



MA)|<^J on(r + 2^M,l], 
|pz(A)| = «/| on [r-2^/pq,r + 2 y /pq], 



and p x p 2 = ^. 

The above will help us to identify the point mass locations in the measure ip since 
each point mass in tp occurs when Ylk^kQkW < oo. Thus we need to find all A £ 
[r + 2^fpq, 1] such that ci(A) = and all A G [— 1, r — such that 02(A) = 0. There 

are two roots, A = 1 and A = — . 



We already know everything about the point mass at A = 1: Qfc(l) = 1 for all k > 0, 

i±g=l 
q-p 



and p = YH^Lo^kQlfi) = 1 ^_ g „ p is the reciprocal of the point mass at A = 1. 



The only other point mass is at A = — ^7- One can verify that p\ y—-^ppj = — 
and Q k = > and therefore 



00 / \ 



(l + g-p)(g + r) 



(g + r) 2 - pq (1 + g - p)(q + r ) - q 



is the reciprocal of the point mass at A = — . 

It follows that the rest of the mass of tp (other than the two point masses) is spread 
inside [r — 2^/pq, r + 2^/pq] . In order to find the density of if) inside [r — 2^/pq, r + 2^/pq] 
we need to find (eo, (P — sI)~ 1 eo) for Im(s) 7^ 0, i.e. the upper left element in the 
resolvent of P. Let (ao(s), a\(s), . . . ) T = (P — sI)~ 1 eo, then 



-soo + ai = 1, and qa n ~i + (r — s)a n + pa n +i = 
Thus a n (s) = axpi{s) n + a 2 p2(s) n , where at = ""[gl^) and ° 2 = Sfe(5y- 
Since (ao, ai, . . . ) 6 ^ 2 (C, 7r), 

a nU/ ^0 as n — > +00 

V p n 
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Hence when |pi(s)| 7^ |/?2 (-s) | , either a\ = or a 2 = 0, and therefore 



p 



Ms) = p: — — + , , ^ (2) 

pi(s)-s p 2 [s) - s 



Now, because of the point masses at 1 and — i ao(s) = ji d f^_ z J can be expressed 
as 

g-p / 1 \ (l + 9-p)(g + r) -g / 1 \ / p(z)cfe 
a o( s ) = TTZ Z 1 1 + /1 , I Z\7Z , ^ 5 1 + 



l + q-p\l-sj {l + q-p)(q + r) ~ s J J (-1,1) z-s 1 

where <p(z) is an atom-less function. Next we will use the following basic property of 
Cauchy transforms Cf(s) = ^ J R that can be derived using the Cauchy integral 

formula, or similarly, an approximation to the identity formula 0: 

C+ - C- = I (3) 

Here C+f(z) = lim s _^ : / m ( s )> Cf(s) and C-f(z) = lim s ^ z . i m (s)<o C f (s) for all z 6 
E. The above equation (J3J) implies 

Z7T2 ys=x+ie : e-^U+ s=x— le : £^U+ 

for all x G (-1,1). Recalling £2), we express ip as ip(x) = 2 ^i{ P T{xf-x)tp%)-x) for 
x £ (r — 2y/pq, r + 2^/pq), which in turn simplifies to 



r)2 4W - if x G (r- 2 y /pq,r + 2 y /pq), 



(p( X ) = { 2ni((r+q)x+q){l-x) 

otherwise 

Let X = (r — 2^/pq,r + 2^/pq) denote the support interval, and let be its indicator 

function. Here 

<p(x)dx = — - — 
and one can check that 



W 1 + {l + q _ p){q + r) °~^ X > + 2^({r + q)x + q)(l-x) W) 

integrates to one. 



Observe that the residues of g(z) = ^- +q y z+q )(i^ z ) are 

Res(g(z),l)= Q ~ P and Res(g(z), Q A _ {I + q ~ p){q + r) - q 



l + q — p V q + rj {l + q — p){q + r 

in the principle branch of the log function. 



1 The curve in the integral does not need to be M for C+ — C_ = I to hold. 
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Now 

t+n 



and therefore, since ci = P2 ~ x and C2 = A ~ pl , 

f A*H mA/.-m - (i+g-PX.g+O-g f 9 V +n , 1 rr+2VM x t( Pi Pi \ HX 

J(-l,l) A VniAjd^^Aj - (1+9 _ p)(9+r) +2^iJr-2 v ^ A ~ J « A > 

where, if we let pi = ^ or z ^ n ^he l° wer semicircle and P2 = z for 2 in the 

upper semicircle, then 

j_ rr+2^/pq t /_£ pl_\ _ ( j_ , (^pq(z+z- 1 )+ryz" y /pq(l-z- 2 )dz 

2™Jr-2^M A \P2-X Pl -X) aA {\Jp) 2niJ\z\=l JI z _ {Vplj{z+z -l )+r) 

P \ 1 I (Jpq{z + z' 1 )+r) t z n (z-z- 1 ) 

* " -dz 



q + rj 2m J\ z \ =l ^ 



l~p r+(l+q-p) \ ( [p r-(l+q-p) 

q 2(q+r) J V q 2(q+r) 



Here the absolute value of the function in the last integral is bounded by M{r + 2^/pq) t 
with M = T — +(1+q _ p) 2 , — _ (i+q _ p) , . Therefore, plugging in the values of vr n , we 

V q 2(g+r) ) [}+ \J q 2(q+r) ) 

show that the distance to stationarity \\u — ^tWrv = \ J2^=o n n 1) ^ t Qn{X)dip(\) 
is bounded above by 

t 



where 



A= ( 1 + g ~ gXg + r ) ~ g n ( 3 V' _ (1 +9 + r) - g 



2(l + g-p)(g + r) ^ V? + *V (1 + q - p)(l - 2p) 



and 



/< = M f P ) f, . 1 \ l.~J 1.' + v^- 



2 \ 9 + r J V \/M _ P/ f 1 /p r +( 1 +g~p) N \ ( 1 _i_ /p y-(i+g-p) \ 

v v v« 2 («+ r ) A + v« 2 ("+ r ) J 

The above upper bound can be improved if one obtains a better estimate of the trigono- 
metric integrals involved in the sum. 

We conclude that t mix {e) = O ( lo ^g,g) ) as e j 0. □ 
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